Toward Reliable and Rapid Elasticity for Streaming Dataflows on Clouds

نویسندگان

  • Anshu Shukla
  • Yogesh L. Simmhan
چکیده

The pervasive availability of streaming data is driving interest in distributed Fast Data platforms for streaming applications. Such latency-sensitive applications need to respond to dynamism in the input rates and task behavior using scale-in and -out on elastic Cloud resources. Platforms like Apache Storm do not provide robust capabilities for responding to such dynamism and for rapid task migration across VMs. We propose several dataflow checkpoint and migration approaches that allow a running streaming dataflow to migrate, without any loss of in-flight messages or their internal tasks states, while reducing the time to recover and stabilize. We implement and evaluate these migration strategies on Apache Storm using micro and application dataflows for scaling in and out on up to 2− 21 Azure VMs. Our results show that we can migrate dataflows of large sizes within 50 sec, in comparison to Storm’s default approach that takes over 100 sec. We also find that our approaches stabilize the application much earlier and there is no failure and re-processing of messages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Relational Approach to Complex Dataflows

Clouds have become an attractive platform for highly scalable processing of Big Data, especially due to the concept of elasticity, which characterizes them. Several languages and systems for cloud-based data processing have been proposed in the past, with the most popular among them being based on MapReduce [7]. In this paper, we present Exareme, a system for elastic large-scale data processing...

متن کامل

A Method to Reduce Effects of Packet Loss in Video Streaming Using Multiple Description Coding

Multiple description (MD) coding has evolved as a promising technique for promoting error resiliency of multimedia system in real-time application programs over error-prone communicational channels. Although multiple description lattice vector quantization (MDCLVQ) is an efficient method for transmitting reliable data in the context of potential error channels, this method doesn’t consider disc...

متن کامل

Towards Elastic Stream Processing: Patterns and Infrastructure

Distributed, highly-parallel processing frameworks as Hadoop are deemed to be state-of-the-art for handling big data today. But they burden application developers with the task to manually implement program logic using lowlevel batch processing APIs. Thus, a movement can be observed that high-level languages are developed which allow to declaratively model dataflows that are automatically optim...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

Realizing a Self-Adaptive Network Architecture for HPC Clouds

Clouds offer significant advantages over traditional cluster computing architectures including ease of deployment, rapid elasticity, and an economically attractive pay-as-you-go business model. However, the effectiveness of cloud computing for HPC systems still remains questionable. When clouds are deployed on lossless interconnection networks, challenges related to load-balancing, low-overhead...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1712.00605  شماره 

صفحات  -

تاریخ انتشار 2017